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Abstract 

We review models of biological evolution in which the population fre- 
quency changes deterministically with time. If the population is self- 
replicating, although the equations for simple prototypes can be linearised, 
nonlinear equations arise in many complex situations. For sexual popula- 
tions, even in the simplest setting, the equations are necessarily nonlinear 
due to the mixing of the parental genetic material. The solutions of such 
nonlinear equations display interesting features such as multiple equilib- 
ria and phase transitions. We mainly discuss those models for which an 
analytical understanding of such nonlinear equations is available. 



1 Introduction 

A population evolves when the changes that happen during a generation are 
passed on to the subsequent generations. These changes may happen in the 
somatic immune cells in order to adapt to a microbe attack or in the germline 
cells. Though in both the cases the genome is altered, in the former, it also 
manifests as changes in the composition of the protein coded by that part of 
the genome. Therefore one defines the models describing biological evolution in 
genotype or protein space [44 . 

The quantity of interest is the population frequency of a genotype which 
changes under the action of two elementary processes namely selection and mu- 
tation. In the simplest setting, the time-dependent equations for the population 
fraction are nonlinear but they can be linearised and the steady state solution 
obtained at long times can be shown to be unique. In more complex situations 
such as when subpopulations are coupled to each other or when the growth rate 
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of a genotype depends on its current frequency, nonlinear evolution equations 
give rise to multiple equilibria. In the cases where the solution is unique, phase 
transition may occur in the steady state. If the process of sexual reproduction 
is also included, the resulting equations are bilinear in population and such in- 
herently nonlinear equations exhibit multiple solutions in the steady state and 
dynamic phase transitions. 

In this review, we will focus on infinite populations which obey deterministic 
equations of evolution. Although the real populations are finite and evolve 
stochastically, phenomena observed in deterministic setting may survive in the 
presence of stochasticity as well |30], and deterministic solutions can also be 
utilised to get insight in the corresponding stochastic problem [24 and to develop 
stochastic theories [20]. For a discussion of topics not covered in this article, 
we refer the reader to several excellent textbooks [15l [Til 132 and other review 
articles on the subject [23]. 

The article is organised as follows. In the next section, we introduce some 
basic concepts and definitions. This is followed by a discussion of models for 
asexually reproducing populations in Sec. [3] and sexually reproducing ones in 
Sec. m Finally a summary and outlook is presented in Sec. [5l 

2 Basic definitions 

In this section, we explain some basic concepts and definitions which are relevant 
to the discussion in the following sections. 

Sequence and sequence space: A sequence a = {cri, ai,} is a string of L 
letters which are chosen from an alphabet of size a. It represents a protein if 
ai denotes one of the a = 20 amino acids and a genotype when the letters are 
one of the four nucleotides. The total sequence space consists of all possible 
strings of length L and thus has a size n = which increases exponentially 
with L. For computational ease, it is useful to lump some of the information 
in a single letter. For example, instead of working with all the four nucleotides 
in a genotype, one can classify them as purines (adenine and guanine) and 
pyrimidines (thymine and cytosine) thus reducing a to two. Similarly instead 
of considering all possible mutations at a locus, one may differentiate between 
genotypes by the absence or presence of a mutation which again corresponds to 
a = 2 [50]. In this article, we will work with binary sequences unless specified 
otherwise. Such n = 2^ sequences can be arranged on a Hamming space, an 
example of which is shown in Fig. [Tjfor binary sequence of length L = 3. Two 
sequences a and are said to be at Hamming distance d{a, cr') if they differ at 
d loci. For a binary sequence in which = or 1, one may write 



Fitness: The fitness W{cf) of a sequence a is a measure of its reproductive 
success in a given environment. It represents the replication rate of a genotype 
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or the functionality of a protein. The sequence space along with the fitness of 
each sequence comprises the fitness landscape. The choice of fitness landscape 
plays an important role in determining the course of evolution and can be made 
according to the biological situation that one wishes to model and the avail- 
able experimental data or the analytical tractability of the problem. A fitness 
landscape can be simple in that the fitness of a sequence depends only on its 
distance from a given sequence. More often however the fitness landscapes are 
complex and one has to specify all the fitnesses. These fitnesses can be as- 
sumed to be independent random variables [22 or they may have correlations 
j39l |43] . An important feature of generic fitness landscapes is the presence of 
epistasis which is a measure of the nonlinear contribution of locus fitness to the 
sequence [29]. If each locus contributes independently to the sequence fitness, 
a fitness landscape is said to be non-epistatic. Fitness can also depend on time 
as in the case of changing environment ^321 [33l [53] or it can be a function of the 
concentration of the genotype frequency. In this review, we will employ various 
types of fitness landscapes. 

Mutation: Stochastic changes known as mutations may happen in the genome 
of an individual. These may insert, delete or change the nucleotides in the 
genome and thus create a new sequence with a different fitness. If the fitness 
of the mutant is higher, the change may propagate in the population and the 
population evolves towards a higher fitness value, otherwise it is eliminated. In 
this review, we will consider only point mutations that change a locus cr^ to one 
of the other a — 1 possibilities with a certain probability and thus preserve the 
length of the sequence. 

Recombination: A sequence genetically different from the parents can be 
produced by the recombination process in which two parent sequences mix to 
produce a new offspring sequence thus producing genetic variation within a 
population. Recombination occurs not only during gamete formation in sexu- 
ally reproducing multicellular organisms but in unicellular organisms such as 
bacteria and fungi as well [16]. We will consider a recombination scheme (one- 
point crossover) in which the parent sequences a and break at a point i and 
exchange the genetic material with a certain probability resulting in offspring 
sequences {ai, cr^, a-^i, ...,cr^} and {a[, a-, ai+i, ....ctl}- 

3 Asexually reproducing populations 

We first describe the equations governing the evolution of self replicating pop- 
ulations. Although the time-dependent equations for the population frequency 
of such asexual populations are nonlinear in general, they can be linearised by a 
transformation of variables in some simple cases [49] [25] . We will mainly discuss 
the steady state properties of these models in the following subsections. 
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Figure 1: The sequence space for L = 3 represented on a Hamming cube. 



3.1 Haploid population 

In a haploid population, each individual carries a single copy of its genome 
sequence a. In the presence of selection and mutation, the population frequency 
X(cr, of a sequence a at generation t-\-l can be obtained from each sequence 
cr' that makes W{a^) copies of itself in one generation and mutates to sequence 
a with a probability M{(j ^ a'). This gives the discrete time evolution equation 
as 

where the denominator on the right hand side (RHS) is the average fitness 
and ensures that the normalisation "^^^ X{(j^t) = 1 is satisfied at all times. If 
the mutation probability per locus per generation is ja and the point mutations 
occur independently at each locus, the probability that a sequence a' mutates 
to sequence a at Hamming distance d{(j^(j') is given by 

M{a ^a') = (1 - (3) 

It is evident that equation ([2j) is nonlinear due to the presence of denomina- 
tor. However in terms of an unnormalised population variable defined as 

t-i 

Z{a, t) = X{a, t) n E W{a')X{a', r) (4) 

r=0 a' 

we find that the unnormalised variables Z{(j^ t) obey a linear equation given by 
Z(cr, t^l) = Yl ^ a)W{a')Z{a', t). (5) 

ct' 

On writing 

^(->^)=V%^ (6) 

in ([5]), equation (|2]) is obtained. In matrix notation, (|5]) can be written as 
Z(t + 1) = AZ(t) where the a, a' element of matrix A is given by M{a ^ 
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a^)W{a') and Z(t) is the population vector at time t. Since the fitness W{a) > 0, 
the matrix A is non-negative and it fohows from the Perron- Frobenius theorem 
that the largest eigenvalue of matrix A is real, positive and nondegenerate with 
the corresponding eigenvector real and positive [4]. Using this eigenvector in 

and taking the infinite time limit, the normalised frequencies in the steady 
state can be obtained. However in some cases, it is possible to work directly 
with the nonlinear equation (j2j) in the steady state (see the discussion below). 

In continuous time, one can write down the equation for the rate of change 
X(cr, t) = dX{a,t)/dt of the fraction X{a,t) of the population with sequence a 



Xia,t) = Y,M{a^ a')W{a')X{a',t) - (Y,W{a')X{a' ,t)^ X{a,t) 



(7) 



where the last term on the RHS is the death term which accounts for the nor- 
malisation X{cF^t) = 1. Note that ([7j) is not the continuous time limit of 
(|2]) although both equations have the same steady state. 

The equations (j2j) and ([7]) define respectively the discrete and continuous 
time versions of Eigen's quasispecies model [121 US] • The main result of the qua- 
sispecies theory is that in the steady state, for several choices of fitness land- 
scapes, there exists a critical mutation rate below which the population forms 
a quasispecies consisting of the fittest sequence and its closely related mutants. 
Above this error threshold, the population is homogeneously distributed over 
the entire sequence space. To illustrate this, we consider the sharp peak fitness 
landscape defined by 

W{a) = WoS^^o + (1 - S,,o) , Wo>l (8) 

where = {0, 0, 0} is the sequence with all zeros. Using this choice for W(<j) 
in (j2j) for the sequence in the steady state, we get 

= WoX{o) + i-x{o) 

In the scaling limit /i ^ 0, L ^ oo with U = fiL finite, the terms in the 
numerator on RHS arising due to mutations to sequence vanish and we obtain 

m 

X(0) = 1-^ , C/<C/e = lnWo (10) 

^ C 

Thus the master sequence supports a finite fraction of population below Uc 
Above the critical probability /7c, the population is homogeneously distributed 
over the sequence space. 

Not all fitness landscapes exhibit error threshold transition (51]. One such 
example is the non-epistatic multiplicative fitness landscape defined by 

W{a) = X[{l-sY^ (11) 
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where the < 5 < 1 is a selection parameter. It can be checked that the exact 
steady state frequency is given by [54] , 



(12) 



i=l 



where xq^xi are the solutions of ([2]) for the corresponding one locus model. For 
a discussion of error threshold transition on other fitness landscapes, we refer 
the reader to [23] . 

If the replication and mutation are treated as independent processes unlike 
in ([2]) and ([7]), we obtain the Crow-Kimura model [9^ 8j in which it is assumed 
that the replication process is error-free and mutations occur due to external 
factors such as radiation. Then the equation for the rate of change X{a^t) can 
be written as [9 , 1 

X(cr, t) = [W{(j) - W{a')X{a', t)]X{cT, t) + ^ M{a ^ ct')X{(t\ t). (13) 



where the mutation matrix is given by 
M(cr ^ 

-Lfi 



d{(T,(T') > 1 

d{a,a') = 1 
d{a,a') = 



(14) 



since M{a ^ cj') should be zero. As in the Eigen's model, the nonlinearity in 
(fT3)) can be eliminated by passing to unnormalised population variables Z(cr, t) 
defined by 



Z[cr,t) = X{a,t) exp 



Jo 



r) 



(15) 



The error threshold transition for various fitness landscapes has been demon- 
strated using the Crow-Kimura equation ([T3)) also [2j |4T] . 



3.2 Diploid population 

Higher organisms such as humans are diploid as they carry two copies of their 
genome and we represent an individual of a diploid population by (cr, cr'). A 
sequence is said to be homozygous if a and a' are identical and heterozygous 
otherwise. Selection-mutation equations analogous to the haploid case can be 
written for the population frequency X((T, t) of the sequence a. For the Crow- 
Kimura model, the evolution equation reads as [521 [3] 

X(cr, t) = [W{(j, t)-Y^ W'(a'^ t)X{a'', t)]X{a, t) + ^ M{a ^ a )X(a^ t). 

a" a' 

(16) 

where VK(cr, t) = W{(J^ cf')X{g' ^ t) is the marginal fitness of sequence a and 
W{(j^(j') is the fitness of genotype (cr, a'). A transformation similar to (fT5]) 
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which can render the above system of nonhnear equations hnear is not known 
and the steady solution may not be unique. 

The existence of multiple steady state solutions can be illustrated by a 
diploid analogue of the sharp fitness landscape defined as [52] 

1^(0,0) = /o = l + 25 (17) 

W(0,cr) = W(cr,0)=/i = l + 2/l5, CTT^O (18) 

W{a,a') = 1 , cr,crVO (19) 

where s^h > 0. In the above equations, 5 is a selection coefficient and is a 
dominance parameter which controls the contribution of the master sequence 
to the fitness of the heterozygote. When = 1, since the fitness 1^(0, a) = 
VK(0,0), the master sequence is dominant. On the other hand, when = 0, 
the fitness W{0,a) = W{a,a^) = 1 and therefore the master sequence acts 
recessively. The dominance is absent when = 1/2 as the heterozygote fitness 
VF(0, cr) = 1 + 5 is the average of the master fitness and the mutant fitness. 
Using the above equation, the marginal fitness can be written as 

^ ' \/iX(0) + (l-X(0)) ,aj^O ^ ' 

and the average fitness as 

Y,W{a")X{a") = X(0)[/oX(0) + /i(l-X(0))] 

a" 

+ (l-X(0))[/iX(0) + l-X(0)] (21) 

Since the fitness landscape (pT|) - (p!9|) depends only on the Hamming distance 
from the master sequence 0, one can work with the error class frequencies Y{d) 
which are obtained by summing over the population fractions at Hamming dis- 
tance d from the master sequence. Specialising to = 0, the steady state 
equation in terms of F's reads as ^52j 



d'=0 



where the mutation matrix M can be found using (p!4|) . The frequency Y{0) 
obeys a polynomial equation of degree at most 2(1/ + 1). For small L, the above 
set of nonlinear equations can be straightforwardly solved. For L = 4, the 
fraction Y{0) obeys a polynomial equation P{Y{0)) = of degree 9 [52j. The 
polynomial P(F(0)) is plotted against Y{0) in Fig. [2] for various s/fi to show the 
occurrence of multiple steady state solutions. Which of these multiple solutions 
occur depends on the initial conditions. For example, an initial distribution with 
Y{0) = 1 gives different steady state fitness from the initial condition Y{L) = 1 
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Figure 2: Plot of the polynomial P(F(0)) as a function of Y{0) for various sf/j. 
(see Sec. 13. 2p . The equilibrium frequency 1^(0) is obtained when P{Y{0)) = 0. 



3.3 Concentration-dependent fitness 

The fitness of a sequence is not always a constant and may depend on the 
concentration of other sequences. In such cases, one ends up with nonlinear 
dynamical equations which cannot be linearised. An example of this scenario 
is the evolution of grammar in a population [28 . It has been proposed [461 
that a set of grammars Gi, Gn are innately available to a learner and the 

language is learnt by just Hstening to the sentences and choosing the correct 
grammar. 

A grammar that is easily understandable has a greater probability of being 
propagated than the others and hence the fitness indicates its prevalence in the 
population. This is equal to the fraction of sentences and their corresponding 
meanings that is common between that grammar and all others multiplied by 
the population fraction using each grammar. If w{i,j) is the probability that 
a speaker of grammar Gj can understand a sentence by a user of grammar G^, 
the fitness W{{X{i)}) of grammar Gi can be given as [28| 

n 

W{{Xm = - ^ [w{z, j) + w{j, i)] X{j) (23) 

If the probability that a person learning from a teacher speaking grammar Gi 
ends up with grammar Gj is M(j ^ z), the rate of change of the population 
speaking Gj can be written as 

n / n \ 

X{j, t) = J2 M{j ^ i)W{{X{i)})X{i, t)-ij2 W{{X{i)})X{i, t) X{j, t) 

i=l \i=l / 

(24) 
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Figure 3: Multiple solutions of population frequency X ioi w = 0.2 and n = 5 
(see Sec j3.3p . Unstable solutions are indicated by broken lines and stable ones 
by solid lines. 



The interpretation of the terms in the above equation is similar to (j2j) or ([7]). 
However an important difference is that the fitness W{{X{i)}) of the gram- 
mar Gi now depends on the frequency of the other grammars as well. Such a 
selection-mutation equation with concentration-dependent fitness is known as 
replicator-mutator equation [35]. 

Assuming that the error to any grammar is equally likely, it follows that 
M{j ^ i) = qdij + [(1 — q)/{n — 1)] (1 — 6ij) where g = 1 — /i is the learning 
accuracy. A detailed analysis of the above equation is possible for the fitness 
choice ^28]: 

w{i^j) = w{j^i) = w for i ^ j (25) 
w{i,i) = 1 (26) 

The stable fixed points for the system of equations given by (|24|) can be found 
by setting the left hand side to be zero and choosing all grammars except one, 
say X{1) = X, to be equally used so that, X{i) = (1 — X)/{n — 1), i ^ 1. This 
reduces the equation for X{1) to 



n — 1 \ n — lj {1 — w){n — 1) 

The above cubic equation for X has three solutions namely Xo,X+ and X_ 
as shown in Fig. [3l The solution Xq corresponds to the case in which all the 
grammars are equally used and exists for all < (7 < 1. The other two solutions 
X± appear beyond a critical learning accuracy qc and correspond to the most 
used (X+) and the least used {X-) grammars. Using a linear stability analysis it 
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can be shown that the stabihty of these solutions fahs in three regimes depending 
on the learning accuracy q: when q < Qc, the fraction Xq is the only solution 
and is stable, whereas in the range Qc ^ Q < Qs all the three solutions exist but 
X- is unstable and finally when q > Qs^ the fraction Xq also loses stability and 
X+ is the only stable solution. 

Concentration based fitness is confined not just to languages but is also seen 
in other systems such as host-parasite |6l |^ and immune system-pathogen 
interactions [18, 26 . In these systems, the evolution is not based on the con- 
centration of the same species populations but on the concentration of other 
species. Thus their evolution equations are coupled and this is dealt with in the 
next section. 

3.4 Coupled quasispecies models 

A class of models in which the growth of a population depends on another pop- 
ulation constitute an example of a set of nonlinear evolution equations. Below 
we discuss two such models in some detail. 

Coevolution of quasispecies: When an organism is infected by a virus, the 
immune receptors of the host cell counterattack the virus. There is a one-to-one 
mapping between the virus and the immune receptors so that a viral sequence 
a is attacked only by its corresponding receptor sequence a, only by and 
so on. In order to escape the immune system, the virus adapts and in response, 
the immune system adapts to counter the new viral strain (see Fig. [4j) and this 
cycle repeats over a time period r. Thus the viral species and the immune 
receptors are involved in a dynamic evolutionary race but may coexist under 
certain conditions as explained below. 

Assuming that both the receptor and viral sequences have the same length L, 
the evolution equations for the frequency X{a,t) of immune receptor sequence 
a and x{a^t) of the corresponding viral sequence a can be written as [26/. 



x{a, t) = Yl ^^'^^ ^ a)w{a,t)x{a, t) - D{X{a, t))x{a, t) (29) 



where the subscripts in the sequence mutation probability M (see (j3j)) denote the 
mutation probability per locus and the death term of the immune receptor D = 
W{x{(j' ^t))X{d-' ^ t). As the immune receptor population moves in response 
to the viral population, the fitness W{x{a^t)) of the receptor a depends on the 
concentration of the corresponding viral sequence a. In the above equations, 
the death terms are different for the two populations as the number of immune 
receptors is conserved while the virus number is not. For simplicity, one can 
choose the death rate of the virus as 



X{<T,t) 



^ M^(o- ^ a')W{x{a', t))X{a', t) - DX{a, t) 



(28) 




(5 , if a = immune receptor master sequence 



(30) 
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Figure 4: Dynamics of the coevolution of the viral (ovals) and the immune 
receptor sequences (rectangles): a) The viral quasispecies is initially formed 
around a master sequence surrounded by its mutant sequences, b) The recep- 
tor sequence corresponding to proliferates and forms receptor quasispecies 
around 0. c) To escape the attack of immune system, the viral master sequence 
randomly shifts to one of its one mutant neighbours, d) In response, the master 
sequence of the immune receptor also migrates to the location of the new viral 
master sequence. 
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A time-dependent sharp peak fitness landscape is assumed for both immune 
receptor and virus as their master sequences move through the sequence space. 
Since the viral fitness w is independent of X{(j)^ we can write 



w{(j^ t) 



Wo ^ if a = viral master sequence at time t 
1 , otherwise 



(31) 



where wq > 1. Similarly W{x{(j^t)) = Wq > 1 if a is the viral master sequence 
and unity otherwise. 

In periodically changing fitness landscapes such as being considered here, 
there is no steady state as the population keeps migrating with the fitness land- 
scape. However one can still define an error threshold in the large time limit 
analogous to that on static fitness landscapes as the maximum mutation rate 
above which the population gets uniformly distributed over the sequence space. 
A possible way to determine the critical mutation rate is to consider the be- 
havior of relative frequency hi of the new master sequence to the frequency of 
a sequence far away from the current master sequence at the time period r of 
the fitness landscape [32]. At large times, it is a good approximation to assume 
that the far-off sequences have reached a quasi-equilibrium and therefore their 
unnormalised frequency grows exponentially fast with the growth rate given by 
the respective fitness. However such an equilibrium is not reached for the pop- 
ulations in the vicinity of the (migrating) master sequence and the growth at 
such sequences depends on the mutational contribution from the current master 
sequence. If the mutation probability or the time period is too small, the pop- 
ulation cannot build up at the new master sequence and the relative frequency 
< 1. On the other hand, the new master sequence grows for > 1. Thus 
K = 1 marks the transition point between the extinction and survival phases of 
the quasispecies on periodically changing fitness landscapes. 

Following the arguments sketched above, the fraction hc^ for the virus can 
be found and is given by [26j 



The relative frequency of the immune receptors hCi, is obtained on replacing 
jj. hy u and wq by Wq in the above expression. Setting k,^ and f<ijy equal to 
one gives a phase diagram in jii — u plane which shows that while both the 
populations exhibit the classical error catastrophe at high mutation rates (as 
discussed in Sec. 13. ip . the viral population has an additional transition point 
when its mutation rate is too low to escape the immune response and in between 
these values the two populations coexist [32, 26 . The predicted mutation rates 
of the B-cells that produce the immune receptors and the receptor lengths that 
maximise both regimes of viral error catastrophe for optimal immune response 
are seen to match the experimental observations [26 . 

Evolution of a mixed population: As discussed in Sec. 13.11 there exists an 
error threshold above which the mutational load is too high to be compensated 




(32) 
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Figure 5: Schematic phase diagram of the quasispecies model with nonmutator 
and mutator populations. The pure nonmutator phase occurs when / = 0, pure 
mutator phase for f > fc and the system is in the mixed phase for < / < /c 
(see Sec. [Mj 



by selection. For this reason, and because most mutations are known to have 
deleterious effect [47l [10] , the spontaneous mutation rate is expected to be small 
p7] . However small subpopulations of strains with high mutation rates have 
been observed in natural isolates [30 and in experiments [45l [5] . 

Consider such a mixed population with nonmutator and mutator strains 
with mutation probability /i and u = A/i, A > 1 respectively. Due to the damage 
in error repair systems, the mutation rate of normal strains can rise and hence 
a nonmutator can convert to a mutator with probability / . Then the average 
fraction x{(j^t) and X{a^t) of the nonmutator and the mutator respectively 
at generation t evolves according to the following coupled nonlinear difference 
equations [3T] : 

x{a,t + l) = (33) 



X{a,t + 1) 



E^,M,{a^a')W{a')X{a',t) 
Wit) 

Wit) 



(34) 



where the average fitness = W(<j) [x(<j, t) + X((T, t)] and the subscripts 
in the mutation matrix refer to the mutation probability per locus per genera- 
tion. For the reasons mentioned above, the mutators are selected against and 
their number is expected to be low. But with increasing /, mutators are con- 
tinually generated thus increasing their frequency and at sufficiently high /, the 
mutator population can reach unity. Thus a phase transition can occur at a crit- 
ical probability fc between the mixed phase with both nonmutator and mutator 
population and a pure mutator phase (see Fig. [5]). In the steady state, such a 
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phase transition has been shown to occur on single peak fitness landscapes [i8] 
and multiplicative fitness landscapes [31 . 

To see this transition for fitness choice (pT]) , we first observe that x{(j) = 
is a solution of Eq. (|33|) and thus corresponds to a phase in which the entire 
population consists of mutators and the total mutator fraction X = -^(c) = 
1. As reduces to (|2]) in this phase, using the exact solution ([12]), the average 
fitness W> in the f > fc phase can be found. If, on the other hand, the total 
nonmutator fraction x = x{a) is nonzero, on summing over all the sequences 
on both sides of Eq. (|33]) , we find that the average fitness >V< in the mixed phase 
corresponding to f < fc does not depend on the mutator fraction and can be 
written as 

W,= "-^'^-7W'W ...^0 ,35) 

thus leading to an uncoupled nonlinear equation for x{a). On eliminating W 
from Eq. (|33|) using the above equation, we see that "^^^ x{(j') obeys 

the quasispecies equation ([2]) and one can find the average fitness W< as well. 
Equating the fitnesses >V< and W> at the critical point, the phase boundary in 
the / — A plane is obtained, 

_ . .i/L ^ (2 - .)(1 - uc) + V4^g(l - ^) + ^^(1 - ^c)^ .3^. 
(2 - s){l - /i) + - ^) + ^'(1 - m)' 

Using the above analysis, it is also possible to calculate the average mutator 
fraction as a function of / and A. The results are seen to be in good agreement 
with the experiments on E. coli [3T] . 



4 Sexually reproducing populations 

In this section, we mainly consider a recombining haploid population with 
sequence length two. As discussed in Sec. [21 due to recombination, the se- 
quences {0,0} and {1,1} can give rise to offspring sequences {0,1} or {1,0}. 
Similarly the recombination between {0,1} and {1,0} can result in {0,0} and 
{1,1}. In the following, for brevity we denote the population at the sequences 
{0, 0}, {0, 1}, {1, 0} and {1, 1} by Xq, Xi, X2 and X3 and their respective fitness 
by 1^0 7^17^2 and Ws. If such a population undergoes recombination alone, 
the frequency Xi{t) evolves according to the following equation: 

3 

X,(t + 1)= ^ R{i^j,k)Xj{t)Xk{t) (37) 

j,k=0 

where R{i ^ j, k) is the probability that a sequence i is obtained by recombining 
sequences j and k. The recombination process between suitable sequences is 
assumed to occur with probability r and does not occur with 1 — r. For example. 
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for the offspring sequence 0, we have 



i?(0^0,0) = l , i?(0^0,l) = i?(0^0,2) = ^ + i^ = i (38) 

i?(0^0,3) = ^^ , i?(0^1,2) = | (39) 

and the rest of the probabihties are zero. On writing the recombination prob- 
abihties in a similar manner for other sequences and using the normahsation 
^^^Q = 1, we find that the population fractions evolve according to [15 

Xo(t + l) = Xo{t) ^ r{X^{t)X2{t) - Xo{t)Xs{t)) (40) 

+ = Xi{t)^r{Xo{t)Xs{t)-X,{t)X2{t)) (41) 

X2(t + 1) = X2{t)^r{Xo{t)Xs{t)-Xi{t)X2{t)) (42) 

X3(t + 1) = Xs{t)^r{Xi{t)X2{t)-Xo{t)Xs{t)) (43) 

Thus the population fractions obey a set of nonlinear equations when recombi- 
nation is present and it is not known if these equations can be linearised. 

The bilinear frequency combination Xi{t)X2{t) — Xo{t)Xs{t) is called linkage 
disequilibrium D{t) at time t and is a measure of the correlation between the 
frequency at the two loci. Using (|4Q|) - (|43|) we have D{t ^ 1) = (1 — r)D(t) so 
that the linkage disequilibrium vanishes in the steady state i.e. X1X2 = XqXs 
and as a consequence, the frequency of the sequence {ai, (12} equals the product 
of frequency of sequences {(Ji} and {(J2}' For example, the frequency of zero 
sequence at first locus equals Xq + Xi and that at the second locus is Xq + X2. 
Using i:> = 0, it follows that the product (Xq + Xi)(Xo + X2) = Xq, the 
frequency of the sequence {0,0}. Although the linkage disequilibrium is zero 
when only recombination is present, it is usually nonzero when selection and/or 
mutation are also included. 

We now discuss the situation when selection, mutation and recombination 
are present. We will consider the fitness scheme in which two fitness peaks are 
separated by a fitness valley and assume that W3 > Wq = 1 > Wi = W2. In 
a population initially localised at {0, 0}, a mutation in {0, 0} to {0, 1} or {1, 0} 
is deleterious but the fitness loss can be compensated by acquiring another 
mutation resulting in the sequence {1,1}. In the absence of recombination 
and for small mutation rates, the population will eventually localise around 
the fittest {1,1} sequence (see Sec 13. ip . However due to nonlinear evolution 
equations, multiple steady states may result [HlIlT]. As discussed below, there 
exists a critical recombination rate Tc below which the population can cross 
the intervening valley and reach the fittest peak at {1,1}. But above Tc, the 
population can remain trapped at the initial sequence with low fitness and thus 
the sexual reproduction can affect the adaptation process adversely. We now 
describe the population behavior for two schemes of mutation rates. 

Multiple equilibria in steady state: If the mutation matrix is symmetric and 
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given by (|3|), the evolution equations can be written as [35] 





-1) 


= x'S) 


-r(l 




(44) 




-1) 


= X[{t) 


+ r(l 


2a? 


(45) 


X2{tA 


-1) 


= x'S) 


+ r(l 


>V2(t) 


(46) 




-1) 


= x'S) 


-r(l 




(47) 



where W{t) = ELo WkXk{t) is the average fitness of the population, the link- 
age disequilibrium D(t) = WoW3Xo{t)X3{t)-WiW2Xi{t)X2{t) and the primed 
fractions are given by the left hand side of (|2j): 

^^^^> W) ^ ' 

To arrive at the set of equations (|44|) - (|47j) . it has been assumed that recombina- 
tion occurs after selection and mutation. Thus in the set of equations (|4Q |) - (|43]) . 
the frequency on the right hand side refers to X'-{t) upon using which (|44|) - (|47|) 
are obtained. 

In the steady state, for the fitness landscape described above, the fractions 
X^'s can be expressed in terms of the fitness W^'s and the average fitness W. 
On using the resulting expressions for X^'s in the equation for W, a quartic 
equation for W is obtained. An analysis |38] of this equation shows that for 
r < Tc, the fittest sequence is always populated while for r > rc, there are two 
stable solutions: either the population stays at the initial sequence {0,0} or 
moves to the fittest sequence {1,1}. 

Time to fixation: If the mutations are unidirectional with the probability 
to mutate from to 1 being /j, and zero for the back mutation, the whole 
population occupies the fittest sequence and the sequence {1,1} is said to be 
fixed. In such a case, it is interesting to study the dynamics of the population 
and more specifically, one can find the time T to fixation. 

For the one-way mutation scheme in which first selection takes place followed 
by recombination and finally mutation, the time evolution occurs according to 
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the following nonlinear equations [2T]: 

_ r(^ - n\^r)(i\ 

(49) 
(50) 
(51) 



Xo(tH 


-1) = 


(1 - ^ifWoXoit) - r(l - ^)^D{t) 


Wit) 


Xi(tH 


-1) = 


m(1 - fi)WoXoit) + (1 - fi)WiXiit) + r(l - fi)^Dit) 


Wit) 


X2(H 


-1) = 


Ai(l - /x)W^oXo(t) + (1 - ^i)W2X2it) + r(l - ^)^Dit) 


Wit) 


X3(n 


-1) = 


^^WoXoit) + ^iWiXiit) + W2X2it)) + W3X3it) 


Wit) 

ril-ii)^Dit) 



Wit) (^2) 

Here D{t) = {WoW3Xo{t)X3{t) - WiW2Xi{t)X2{t))/W{t) is the linkage dise- 
quilibrium at time t and W{t) = ^l.^QWkXk{t) is the average fitness of the 
population. The above equations can be written down in a manner analogous 
to the above cases. Since selection occurs before recombination, on replacing 
Xi{t) by WiXi{t)/W{t) on the RHS of (|10|)-(|l3l), the evolution equations with 
selection and recombination are obtained. Finally the unidirectional mutation 
scheme is implemented. 

The equations for the corresponding unnormalised populations Z^s defined 
by dlj) can also be written. But due to the recombination term, the equations for 
Z/e's also remain nonlinear. An approximate method to handle these dynamical 
nonlinear equations can be developed by noting that at any instant, for small 
mutation rates, only one of the four populations dominate. Then the dynamics 
of population Z/^'s can be divided in following three dynamical phases [21 : 
(i) Zq > Zi.Zs (phase I) (ii) Zi > Zq.Zs (phase II) and (iii) Zs > Zq.Zi 
(phase III). Thus one can expand the equations for unnormalised populations 
in powers of Zi/Zq, Z^/Zq in phase I, ^0/^1,^3/^1 in phase II and similarly, 
Zq/Zs, Zi/Zs in phase III. The time at which a phase ends is obtained by 
matching the solutions of the relevant populations in the two phases. The 
fixation time is then obtained by summing over these phase times. 

As mentioned above, there exists a critical recombination fraction Tc beyond 
which a population initially located at {0, 0} cannot cross the intermediate fit- 
ness valley and reach the double mutant fitness peak ^^^J. The inset of Fig. [6] 
shows that the fixation time diverges as r approaches critical recombination 
probability Tc = {w^ — l)/w4. A simple calculation using the method described 
above but ignoring the nonlinearities shows that the fixation time diverges as 
l/(^c — ^)- However a more careful analysis that takes the nonlinear terms into 
account shows that the fixation time is well approximated by [21 

^ 1 hn 1. f Wsil-W^)ir,-r)^ \ _ ( 2r^^WfK 
^ ire - r)W3 [ \il-Wi+ nW3)W^K^ ) V (1 " - r)^ , 

(53) 

where the constant K (vc — r) ^ . Thus the fixation time decays slower than 
l/(^c — due to the logarithmic corrections (see Fig. [6]). 
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Figure 6: Fixation time as a function of r obtained using exact iteration (•) and 
analytical result (x) given by (|53|) . The solid line has a slope equal to —1. 

The population frequencies and fixation time can be analysed for other fit- 
ness schemes as well and a discussion can be found in [lH [21] . Although we 
have discussed the haploid case, the diploid problem has also been studied [7^. 
For studies on models that consider more than two loci, the reader may refer to 
[371 US]. 

5 Summary 

In this review, we have presented a brief (and incomplete) overview of evolu- 
tionary processes and models in deterministically evolving populations. As we 
have discussed, these systems are inherently nonlinear and difficult to analyse 
analytically. The nonlinearity of these systems that makes them so difficult to 
handle, is also responsible for the complex behaviour of their solutions. The 
existence of multiple steady states and dynamic phase transitions are some of 
the interesting features displayed by these models. 

While these theoretical models of evolutionary biology have garnered inter- 
est amongst physicists and mathematicians, they have also been successful in 
predicting biological properties and explaining the experimental results quanti- 
tatively. It is hoped that the work integrated from various disciplines will take 
us closer to an understanding of the complex and continuous process of the 
evolution of life. 
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